Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells417435
Missing cells (%)7.8%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 74 (16.6%) missing values Age has 90 (20.2%) missing values Missing
Cabin has 343 (76.9%) missing values Cabin has 345 (77.4%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 302 (67.7%) zeros SibSp has 305 (68.4%) zeros Zeros
Parch has 346 (77.6%) zeros Parch has 340 (76.2%) zeros Zeros
Fare has 9 (2.0%) zeros Fare has 5 (1.1%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-02-26 14:53:39.3580222024-02-26 14:53:43.744644
Analysis finished2024-02-26 14:53:43.7435592024-02-26 14:53:47.704923
Duration4.39 seconds3.96 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean447.57623445.80493
 Dataset ADataset B
Minimum11
Maximum891889
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-02-26T14:53:47.877081image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile48.533.25
Q1232.5222.5
median456457.5
Q3664.75665.5
95-th percentile844.25850
Maximum891889
Range890888
Interquartile range (IQR)432.25443

Descriptive statistics

 Dataset ADataset B
Standard deviation257.3442260.46221
Coefficient of variation (CV)0.574972890.58425152
Kurtosis-1.1901699-1.1936929
Mean447.57623445.80493
Median Absolute Deviation (MAD)215.5222.5
Skewness-0.005753219-0.036756423
Sum199619198829
Variance66226.03867840.562
MonotonicityNot monotonicNot monotonic
2024-02-26T14:53:48.144995image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
339 1
 
0.2%
42 1
 
0.2%
399 1
 
0.2%
112 1
 
0.2%
743 1
 
0.2%
209 1
 
0.2%
43 1
 
0.2%
275 1
 
0.2%
834 1
 
0.2%
595 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
46 1
 
0.2%
761 1
 
0.2%
42 1
 
0.2%
820 1
 
0.2%
28 1
 
0.2%
307 1
 
0.2%
256 1
 
0.2%
364 1
 
0.2%
202 1
 
0.2%
314 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
6 1
0.2%
8 1
0.2%
10 1
0.2%
11 1
0.2%
14 1
0.2%
17 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
10 1
0.2%
13 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
10 1
0.2%
13 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
6 1
0.2%
8 1
0.2%
10 1
0.2%
11 1
0.2%
14 1
0.2%
17 1
0.2%
21 1
0.2%
22 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
267 
1
179 
0
277 
1
169 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row01
3rd row10
4th row00
5th row11

Common Values

ValueCountFrequency (%)
0 267
59.9%
1 179
40.1%
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%

Length

2024-02-26T14:53:48.347235image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-02-26T14:53:48.492172image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:48.628790image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
0 267
59.9%
1 179
40.1%
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%

Most occurring characters

ValueCountFrequency (%)
0 267
59.9%
1 179
40.1%
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 267
59.9%
1 179
40.1%
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 267
59.9%
1 179
40.1%
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 267
59.9%
1 179
40.1%
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
236 
1
105 
2
105 
3
252 
1
104 
2
90 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row12
3rd row23
4th row13
5th row22

Common Values

ValueCountFrequency (%)
3 236
52.9%
1 105
23.5%
2 105
23.5%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Length

2024-02-26T14:53:48.775080image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-02-26T14:53:48.921040image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:49.067101image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
3 236
52.9%
1 105
23.5%
2 105
23.5%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Most occurring characters

ValueCountFrequency (%)
3 236
52.9%
1 105
23.5%
2 105
23.5%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 236
52.9%
1 105
23.5%
2 105
23.5%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 236
52.9%
1 105
23.5%
2 105
23.5%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 236
52.9%
1 105
23.5%
2 105
23.5%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-02-26T14:53:49.464301image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6767
Median length4950
Mean length26.73094227.107623
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1192212090
Distinct characters5960
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowDahl, Mr. Karl EdwartRogers, Mr. William John
2nd rowChaffee, Mr. Herbert FullerMellors, Mr. William John
3rd rowCameron, Miss. Clear AnnieGoodwin, Mrs. Frederick (Augusta Tyler)
4th rowGiglio, Mr. VictorOlsen, Mr. Karl Siegwart Andreas
5th rowRichards, Master. George SibleyHewlett, Mrs. (Mary D Kingcome)
ValueCountFrequency (%)
mr 254
 
14.0%
miss 95
 
5.3%
mrs 66
 
3.7%
william 30
 
1.7%
john 25
 
1.4%
henry 20
 
1.1%
master 18
 
1.0%
james 14
 
0.8%
mary 12
 
0.7%
george 11
 
0.6%
Other values (878) 1263
69.9%
ValueCountFrequency (%)
mr 265
 
14.5%
miss 88
 
4.8%
mrs 66
 
3.6%
william 30
 
1.6%
master 20
 
1.1%
john 19
 
1.0%
henry 17
 
0.9%
thomas 17
 
0.9%
james 15
 
0.8%
edward 13
 
0.7%
Other values (890) 1281
70.0%
2024-02-26T14:53:50.195713image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1362
 
11.4%
r 973
 
8.2%
e 845
 
7.1%
a 790
 
6.6%
i 666
 
5.6%
n 638
 
5.4%
s 638
 
5.4%
M 560
 
4.7%
l 541
 
4.5%
o 509
 
4.3%
Other values (49) 4400
36.9%
ValueCountFrequency (%)
1387
 
11.5%
r 967
 
8.0%
e 868
 
7.2%
a 846
 
7.0%
n 652
 
5.4%
s 643
 
5.3%
i 635
 
5.3%
M 573
 
4.7%
o 514
 
4.3%
l 513
 
4.2%
Other values (50) 4492
37.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7660
64.3%
Uppercase Letter 1816
 
15.2%
Space Separator 1362
 
11.4%
Other Punctuation 937
 
7.9%
Open Punctuation 70
 
0.6%
Close Punctuation 70
 
0.6%
Dash Punctuation 7
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 7754
64.1%
Uppercase Letter 1846
 
15.3%
Space Separator 1387
 
11.5%
Other Punctuation 947
 
7.8%
Open Punctuation 75
 
0.6%
Close Punctuation 75
 
0.6%
Dash Punctuation 6
 
< 0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1362
100.0%
ValueCountFrequency (%)
1387
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 973
12.7%
e 845
11.0%
a 790
10.3%
i 666
8.7%
n 638
8.3%
s 638
8.3%
l 541
 
7.1%
o 509
 
6.6%
t 324
 
4.2%
h 254
 
3.3%
Other values (16) 1482
19.3%
ValueCountFrequency (%)
r 967
12.5%
e 868
11.2%
a 846
10.9%
n 652
8.4%
s 643
8.3%
i 635
8.2%
o 514
 
6.6%
l 513
 
6.6%
t 341
 
4.4%
h 267
 
3.4%
Other values (16) 1508
19.4%
Uppercase Letter
ValueCountFrequency (%)
M 560
30.8%
A 119
 
6.6%
J 113
 
6.2%
H 103
 
5.7%
S 88
 
4.8%
E 86
 
4.7%
B 78
 
4.3%
C 75
 
4.1%
W 69
 
3.8%
L 66
 
3.6%
Other values (15) 459
25.3%
ValueCountFrequency (%)
M 573
31.0%
A 125
 
6.8%
J 111
 
6.0%
H 101
 
5.5%
S 89
 
4.8%
C 85
 
4.6%
E 83
 
4.5%
B 73
 
4.0%
W 69
 
3.7%
L 68
 
3.7%
Other values (15) 469
25.4%
Other Punctuation
ValueCountFrequency (%)
. 447
47.7%
, 446
47.6%
" 42
 
4.5%
' 2
 
0.2%
ValueCountFrequency (%)
. 447
47.2%
, 446
47.1%
" 48
 
5.1%
' 5
 
0.5%
/ 1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 70
100.0%
ValueCountFrequency (%)
( 75
100.0%
Close Punctuation
ValueCountFrequency (%)
) 70
100.0%
ValueCountFrequency (%)
) 75
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9476
79.5%
Common 2446
 
20.5%
ValueCountFrequency (%)
Latin 9600
79.4%
Common 2490
 
20.6%

Most frequent character per script

Common
ValueCountFrequency (%)
1362
55.7%
. 447
 
18.3%
, 446
 
18.2%
( 70
 
2.9%
) 70
 
2.9%
" 42
 
1.7%
- 7
 
0.3%
' 2
 
0.1%
ValueCountFrequency (%)
1387
55.7%
. 447
 
18.0%
, 446
 
17.9%
( 75
 
3.0%
) 75
 
3.0%
" 48
 
1.9%
- 6
 
0.2%
' 5
 
0.2%
/ 1
 
< 0.1%
Latin
ValueCountFrequency (%)
r 973
 
10.3%
e 845
 
8.9%
a 790
 
8.3%
i 666
 
7.0%
n 638
 
6.7%
s 638
 
6.7%
M 560
 
5.9%
l 541
 
5.7%
o 509
 
5.4%
t 324
 
3.4%
Other values (41) 2992
31.6%
ValueCountFrequency (%)
r 967
 
10.1%
e 868
 
9.0%
a 846
 
8.8%
n 652
 
6.8%
s 643
 
6.7%
i 635
 
6.6%
M 573
 
6.0%
o 514
 
5.4%
l 513
 
5.3%
t 341
 
3.6%
Other values (41) 3048
31.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11922
100.0%
ValueCountFrequency (%)
ASCII 12090
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1362
 
11.4%
r 973
 
8.2%
e 845
 
7.1%
a 790
 
6.6%
i 666
 
5.6%
n 638
 
5.4%
s 638
 
5.4%
M 560
 
4.7%
l 541
 
4.5%
o 509
 
4.3%
Other values (49) 4400
36.9%
ValueCountFrequency (%)
1387
 
11.5%
r 967
 
8.0%
e 868
 
7.2%
a 846
 
7.0%
n 652
 
5.4%
s 643
 
5.3%
i 635
 
5.3%
M 573
 
4.7%
o 514
 
4.3%
l 513
 
4.2%
Other values (50) 4492
37.2%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
284 
female
162 
male
291 
female
155 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72645744.6950673
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21082094
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalemale
3rd rowfemalefemale
4th rowmalemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%

Length

2024-02-26T14:53:50.622989image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-02-26T14:53:50.786796image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:50.923746image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%

Most occurring characters

ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2108
100.0%
ValueCountFrequency (%)
Lowercase Letter 2094
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 2108
100.0%
ValueCountFrequency (%)
Latin 2094
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2108
100.0%
ValueCountFrequency (%)
ASCII 2094
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7973
Distinct (%)21.2%20.5%
Missing7490
Missing (%)16.6%20.2%
Infinite00
Infinite (%)0.0%0.0%
Mean29.73005430.452725
 Dataset ADataset B
Minimum0.670.42
Maximum7471
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-02-26T14:53:51.135212image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.670.42
5-th percentile4.555
Q120.37521
median2929
Q33939
95-th percentile53.4556
Maximum7471
Range73.3370.58
Interquartile range (IQR)18.62518

Descriptive statistics

 Dataset ADataset B
Standard deviation14.10599414.422212
Coefficient of variation (CV)0.474469160.47359349
Kurtosis-0.044239760.028659718
Mean29.73005430.452725
Median Absolute Deviation (MAD)99
Skewness0.246811640.32642933
Sum11059.5810841.17
Variance198.97906208.00021
MonotonicityNot monotonicNot monotonic
2024-02-26T14:53:51.412688image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21 16
 
3.6%
19 16
 
3.6%
24 15
 
3.4%
29 13
 
2.9%
30 12
 
2.7%
22 12
 
2.7%
18 12
 
2.7%
36 12
 
2.7%
28 11
 
2.5%
32 11
 
2.5%
Other values (69) 242
54.3%
(Missing) 74
 
16.6%
ValueCountFrequency (%)
22 16
 
3.6%
18 14
 
3.1%
19 14
 
3.1%
24 13
 
2.9%
29 11
 
2.5%
27 11
 
2.5%
25 11
 
2.5%
26 10
 
2.2%
36 10
 
2.2%
16 10
 
2.2%
Other values (63) 236
52.9%
(Missing) 90
 
20.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
1 3
0.7%
2 5
1.1%
3 3
0.7%
4 4
0.9%
5 3
0.7%
6 2
 
0.4%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
1 3
0.7%
2 3
0.7%
3 2
 
0.4%
4 6
1.3%
5 4
0.9%
6 2
 
0.4%
8 3
0.7%
9 4
0.9%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
1 3
0.7%
2 3
0.7%
3 2
 
0.4%
4 6
1.3%
5 4
0.9%
6 2
 
0.4%
8 3
0.7%
9 4
0.9%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
1 3
0.7%
2 5
1.1%
3 3
0.7%
4 4
0.9%
5 3
0.7%
6 2
 
0.4%
7 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.502242150.55829596
 Dataset ADataset B
Minimum00
Maximum88
Zeros302305
Zeros (%)67.7%68.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-02-26T14:53:51.617284image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile23
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.01394521.2029779
Coefficient of variation (CV)2.01883732.1547314
Kurtosis16.87059516.759713
Mean0.502242150.55829596
Median Absolute Deviation (MAD)00
Skewness3.50840513.6627275
Sum224249
Variance1.02808481.4471557
MonotonicityNot monotonicNot monotonic
2024-02-26T14:53:51.782840image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 302
67.7%
1 109
 
24.4%
2 15
 
3.4%
4 9
 
2.0%
3 6
 
1.3%
5 3
 
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0 305
68.4%
1 99
 
22.2%
2 16
 
3.6%
4 9
 
2.0%
3 9
 
2.0%
8 5
 
1.1%
5 3
 
0.7%
ValueCountFrequency (%)
0 302
67.7%
1 109
 
24.4%
2 15
 
3.4%
3 6
 
1.3%
4 9
 
2.0%
5 3
 
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0 305
68.4%
1 99
 
22.2%
2 16
 
3.6%
3 9
 
2.0%
4 9
 
2.0%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 305
68.4%
1 99
 
22.2%
2 16
 
3.6%
3 9
 
2.0%
4 9
 
2.0%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 302
67.7%
1 109
 
24.4%
2 15
 
3.4%
3 6
 
1.3%
4 9
 
2.0%
5 3
 
0.7%
8 2
 
0.4%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.352017940.40134529
 Dataset ADataset B
Minimum00
Maximum56
Zeros346340
Zeros (%)77.6%76.2%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-02-26T14:53:51.941077image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.758077380.88832527
Coefficient of variation (CV)2.15351922.2133691
Kurtosis8.625673611.399706
Mean0.352017940.40134529
Median Absolute Deviation (MAD)00
Skewness2.63982983.0441951
Sum157179
Variance0.574681310.78912178
MonotonicityNot monotonicNot monotonic
2024-02-26T14:53:52.099175image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 346
77.6%
1 55
 
12.3%
2 39
 
8.7%
5 2
 
0.4%
3 2
 
0.4%
4 2
 
0.4%
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 36
 
8.1%
5 5
 
1.1%
3 3
 
0.7%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 346
77.6%
1 55
 
12.3%
2 39
 
8.7%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 36
 
8.1%
3 3
 
0.7%
4 2
 
0.4%
5 5
 
1.1%
6 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 59
 
13.2%
2 36
 
8.1%
3 3
 
0.7%
4 2
 
0.4%
5 5
 
1.1%
6 1
 
0.2%
ValueCountFrequency (%)
0 346
77.6%
1 55
 
12.3%
2 39
 
8.7%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct371375
Distinct (%)83.2%84.1%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-02-26T14:53:52.629867image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.78475346.8565022
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30263058
Distinct characters3535
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique312327 ?
Unique (%)70.0%73.3%

Sample

 Dataset ADataset B
1st row7598S.C./A.4. 23567
2nd rowW.E.P. 5734SW/PP 751
3rd rowF.C.C. 13528CA 2144
4th rowPC 175934579
5th row29106248706
ValueCountFrequency (%)
pc 33
 
5.7%
c.a 15
 
2.6%
a/5 12
 
2.1%
2 6
 
1.0%
ston/o 6
 
1.0%
w./c 5
 
0.9%
sc/paris 5
 
0.9%
c 5
 
0.9%
ca 5
 
0.9%
soton/o.q 4
 
0.7%
Other values (392) 479
83.3%
ValueCountFrequency (%)
pc 29
 
5.0%
c.a 18
 
3.1%
ca 10
 
1.7%
a/5 7
 
1.2%
w./c 7
 
1.2%
2 6
 
1.0%
ston/o 6
 
1.0%
sc/paris 6
 
1.0%
347088 5
 
0.9%
soton/o.q 5
 
0.9%
Other values (394) 476
82.8%
2024-02-26T14:53:53.424840image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 354
11.7%
1 335
11.1%
2 287
9.5%
7 253
8.4%
4 235
 
7.8%
6 220
 
7.3%
5 201
 
6.6%
0 197
 
6.5%
9 161
 
5.3%
8 151
 
5.0%
Other values (25) 632
20.9%
ValueCountFrequency (%)
3 376
12.3%
1 330
10.8%
2 287
9.4%
7 249
 
8.1%
4 229
 
7.5%
6 210
 
6.9%
0 203
 
6.6%
5 199
 
6.5%
9 171
 
5.6%
8 136
 
4.4%
Other values (25) 668
21.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2394
79.1%
Uppercase Letter 324
 
10.7%
Other Punctuation 163
 
5.4%
Space Separator 129
 
4.3%
Lowercase Letter 16
 
0.5%
ValueCountFrequency (%)
Decimal Number 2390
78.2%
Uppercase Letter 350
 
11.4%
Other Punctuation 173
 
5.7%
Space Separator 129
 
4.2%
Lowercase Letter 16
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 354
14.8%
1 335
14.0%
2 287
12.0%
7 253
10.6%
4 235
9.8%
6 220
9.2%
5 201
8.4%
0 197
8.2%
9 161
6.7%
8 151
6.3%
ValueCountFrequency (%)
3 376
15.7%
1 330
13.8%
2 287
12.0%
7 249
10.4%
4 229
9.6%
6 210
8.8%
0 203
8.5%
5 199
8.3%
9 171
7.2%
8 136
 
5.7%
Space Separator
ValueCountFrequency (%)
129
100.0%
ValueCountFrequency (%)
129
100.0%
Other Punctuation
ValueCountFrequency (%)
. 113
69.3%
/ 50
30.7%
ValueCountFrequency (%)
. 120
69.4%
/ 53
30.6%
Uppercase Letter
ValueCountFrequency (%)
C 83
25.6%
P 55
17.0%
A 45
13.9%
O 44
13.6%
S 33
 
10.2%
N 17
 
5.2%
T 15
 
4.6%
W 7
 
2.2%
Q 6
 
1.9%
F 5
 
1.5%
Other values (6) 14
 
4.3%
ValueCountFrequency (%)
C 87
24.9%
O 51
14.6%
A 48
13.7%
P 48
13.7%
S 39
11.1%
N 20
 
5.7%
T 18
 
5.1%
W 11
 
3.1%
Q 8
 
2.3%
I 5
 
1.4%
Other values (6) 15
 
4.3%
Lowercase Letter
ValueCountFrequency (%)
a 4
25.0%
s 4
25.0%
r 3
18.8%
i 3
18.8%
l 1
 
6.2%
e 1
 
6.2%
ValueCountFrequency (%)
a 4
25.0%
s 4
25.0%
i 3
18.8%
r 3
18.8%
l 1
 
6.2%
e 1
 
6.2%

Most occurring scripts

ValueCountFrequency (%)
Common 2686
88.8%
Latin 340
 
11.2%
ValueCountFrequency (%)
Common 2692
88.0%
Latin 366
 
12.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 354
13.2%
1 335
12.5%
2 287
10.7%
7 253
9.4%
4 235
8.7%
6 220
8.2%
5 201
7.5%
0 197
7.3%
9 161
6.0%
8 151
5.6%
Other values (3) 292
10.9%
ValueCountFrequency (%)
3 376
14.0%
1 330
12.3%
2 287
10.7%
7 249
9.2%
4 229
8.5%
6 210
7.8%
0 203
7.5%
5 199
7.4%
9 171
6.4%
8 136
 
5.1%
Other values (3) 302
11.2%
Latin
ValueCountFrequency (%)
C 83
24.4%
P 55
16.2%
A 45
13.2%
O 44
12.9%
S 33
 
9.7%
N 17
 
5.0%
T 15
 
4.4%
W 7
 
2.1%
Q 6
 
1.8%
F 5
 
1.5%
Other values (12) 30
 
8.8%
ValueCountFrequency (%)
C 87
23.8%
O 51
13.9%
A 48
13.1%
P 48
13.1%
S 39
10.7%
N 20
 
5.5%
T 18
 
4.9%
W 11
 
3.0%
Q 8
 
2.2%
I 5
 
1.4%
Other values (12) 31
 
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3026
100.0%
ValueCountFrequency (%)
ASCII 3058
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 354
11.7%
1 335
11.1%
2 287
9.5%
7 253
8.4%
4 235
 
7.8%
6 220
 
7.3%
5 201
 
6.6%
0 197
 
6.5%
9 161
 
5.3%
8 151
 
5.0%
Other values (25) 632
20.9%
ValueCountFrequency (%)
3 376
12.3%
1 330
10.8%
2 287
9.4%
7 249
 
8.1%
4 229
 
7.5%
6 210
 
6.9%
0 203
 
6.6%
5 199
 
6.5%
9 171
 
5.6%
8 136
 
4.4%
Other values (25) 668
21.8%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct172177
Distinct (%)38.6%39.7%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.15022333.447393
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros95
Zeros (%)2.0%1.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-02-26T14:53:53.713584image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.9258.05
median13.6458514.5
Q331.35937531.3875
95-th percentile130.2375112.67708
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.43437523.3375

Descriptive statistics

 Dataset ADataset B
Standard deviation49.96557152.921279
Coefficient of variation (CV)1.50724691.5822244
Kurtosis24.5769933.309281
Mean33.15022333.447393
Median Absolute Deviation (MAD)6.395857.25
Skewness4.08537794.8818584
Sum14784.99914917.537
Variance2496.55832800.6618
MonotonicityNot monotonicNot monotonic
2024-02-26T14:53:53.988845image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 24
 
5.4%
8.05 22
 
4.9%
10.5 18
 
4.0%
7.75 18
 
4.0%
7.8958 15
 
3.4%
26 14
 
3.1%
7.25 10
 
2.2%
0 9
 
2.0%
7.775 8
 
1.8%
7.925 8
 
1.8%
Other values (162) 300
67.3%
ValueCountFrequency (%)
8.05 23
 
5.2%
10.5 19
 
4.3%
7.8958 18
 
4.0%
13 16
 
3.6%
7.75 15
 
3.4%
26 14
 
3.1%
7.25 9
 
2.0%
26.55 8
 
1.8%
8.6625 8
 
1.8%
7.925 8
 
1.8%
Other values (167) 308
69.1%
ValueCountFrequency (%)
0 9
2.0%
4.0125 1
 
0.2%
5 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.975 2
 
0.4%
7.05 3
 
0.7%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 5
1.1%
5 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 4
0.9%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 5
1.1%
5 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 4
0.9%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 9
2.0%
4.0125 1
 
0.2%
5 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.975 2
 
0.4%
7.05 3
 
0.7%
7.125 1
 
0.2%
7.1417 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8291
Distinct (%)79.6%90.1%
Missing343345
Missing (%)76.9%77.4%
Memory size7.0 KiB7.0 KiB
2024-02-26T14:53:54.500066image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.69902913.5643564
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters381360
Distinct characters1919
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6381 ?
Unique (%)61.2%80.2%

Sample

 Dataset ADataset B
1st rowE31E101
2nd rowB86C47
3rd rowB51 B53 B55E77
4th rowE63D19
5th rowA10C126
ValueCountFrequency (%)
d 3
 
2.4%
b98 3
 
2.4%
b96 3
 
2.4%
b66 2
 
1.6%
e67 2
 
1.6%
f33 2
 
1.6%
c27 2
 
1.6%
c25 2
 
1.6%
c23 2
 
1.6%
b58 2
 
1.6%
Other values (84) 100
81.3%
ValueCountFrequency (%)
b18 2
 
1.7%
b51 2
 
1.7%
c23 2
 
1.7%
e101 2
 
1.7%
c123 2
 
1.7%
d 2
 
1.7%
b55 2
 
1.7%
b53 2
 
1.7%
c125 2
 
1.7%
b49 2
 
1.7%
Other values (91) 97
82.9%
2024-02-26T14:53:55.207944image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 38
 
10.0%
C 34
 
8.9%
1 34
 
8.9%
2 33
 
8.7%
3 29
 
7.6%
6 28
 
7.3%
5 25
 
6.6%
20
 
5.2%
0 20
 
5.2%
D 19
 
5.0%
Other values (9) 101
26.5%
ValueCountFrequency (%)
1 38
10.6%
C 34
 
9.4%
B 33
 
9.2%
3 32
 
8.9%
5 28
 
7.8%
2 26
 
7.2%
6 20
 
5.6%
4 19
 
5.3%
D 19
 
5.3%
8 18
 
5.0%
Other values (9) 93
25.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 238
62.5%
Uppercase Letter 123
32.3%
Space Separator 20
 
5.2%
ValueCountFrequency (%)
Decimal Number 227
63.1%
Uppercase Letter 117
32.5%
Space Separator 16
 
4.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 38
30.9%
C 34
27.6%
D 19
15.4%
E 19
15.4%
F 6
 
4.9%
G 3
 
2.4%
A 3
 
2.4%
T 1
 
0.8%
ValueCountFrequency (%)
C 34
29.1%
B 33
28.2%
D 19
16.2%
E 16
13.7%
A 6
 
5.1%
F 6
 
5.1%
G 2
 
1.7%
T 1
 
0.9%
Decimal Number
ValueCountFrequency (%)
1 34
14.3%
2 33
13.9%
3 29
12.2%
6 28
11.8%
5 25
10.5%
0 20
8.4%
8 19
8.0%
4 17
7.1%
7 17
7.1%
9 16
6.7%
ValueCountFrequency (%)
1 38
16.7%
3 32
14.1%
5 28
12.3%
2 26
11.5%
6 20
8.8%
4 19
8.4%
8 18
7.9%
7 17
7.5%
9 17
7.5%
0 12
 
5.3%
Space Separator
ValueCountFrequency (%)
20
100.0%
ValueCountFrequency (%)
16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 258
67.7%
Latin 123
32.3%
ValueCountFrequency (%)
Common 243
67.5%
Latin 117
32.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 38
30.9%
C 34
27.6%
D 19
15.4%
E 19
15.4%
F 6
 
4.9%
G 3
 
2.4%
A 3
 
2.4%
T 1
 
0.8%
ValueCountFrequency (%)
C 34
29.1%
B 33
28.2%
D 19
16.2%
E 16
13.7%
A 6
 
5.1%
F 6
 
5.1%
G 2
 
1.7%
T 1
 
0.9%
Common
ValueCountFrequency (%)
1 34
13.2%
2 33
12.8%
3 29
11.2%
6 28
10.9%
5 25
9.7%
20
7.8%
0 20
7.8%
8 19
7.4%
4 17
6.6%
7 17
6.6%
ValueCountFrequency (%)
1 38
15.6%
3 32
13.2%
5 28
11.5%
2 26
10.7%
6 20
8.2%
4 19
7.8%
8 18
7.4%
7 17
7.0%
9 17
7.0%
16
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 381
100.0%
ValueCountFrequency (%)
ASCII 360
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B 38
 
10.0%
C 34
 
8.9%
1 34
 
8.9%
2 33
 
8.7%
3 29
 
7.6%
6 28
 
7.3%
5 25
 
6.6%
20
 
5.2%
0 20
 
5.2%
D 19
 
5.0%
Other values (9) 101
26.5%
ValueCountFrequency (%)
1 38
10.6%
C 34
 
9.4%
B 33
 
9.2%
3 32
 
8.9%
5 28
 
7.8%
2 26
 
7.2%
6 20
 
5.6%
4 19
 
5.3%
D 19
 
5.3%
8 18
 
5.0%
Other values (9) 93
25.8%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
S
333 
C
80 
Q
 
33
S
321 
C
83 
Q
42 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSS
4th rowCS
5th rowSS

Common Values

ValueCountFrequency (%)
S 333
74.7%
C 80
 
17.9%
Q 33
 
7.4%
ValueCountFrequency (%)
S 321
72.0%
C 83
 
18.6%
Q 42
 
9.4%

Length

2024-02-26T14:53:55.444037image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-02-26T14:53:55.594281image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:55.742041image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
s 333
74.7%
c 80
 
17.9%
q 33
 
7.4%
ValueCountFrequency (%)
s 321
72.0%
c 83
 
18.6%
q 42
 
9.4%

Most occurring characters

ValueCountFrequency (%)
S 333
74.7%
C 80
 
17.9%
Q 33
 
7.4%
ValueCountFrequency (%)
S 321
72.0%
C 83
 
18.6%
Q 42
 
9.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 446
100.0%
ValueCountFrequency (%)
Uppercase Letter 446
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 333
74.7%
C 80
 
17.9%
Q 33
 
7.4%
ValueCountFrequency (%)
S 321
72.0%
C 83
 
18.6%
Q 42
 
9.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 446
100.0%
ValueCountFrequency (%)
Latin 446
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 333
74.7%
C 80
 
17.9%
Q 33
 
7.4%
ValueCountFrequency (%)
S 321
72.0%
C 83
 
18.6%
Q 42
 
9.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 333
74.7%
C 80
 
17.9%
Q 33
 
7.4%
ValueCountFrequency (%)
S 321
72.0%
C 83
 
18.6%
Q 42
 
9.4%

Interactions

Dataset A

2024-02-26T14:53:42.615858image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:46.593233image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:39.969190image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:43.914034image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:40.697163image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:44.514222image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:41.326008image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:45.142307image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:41.988725image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:45.958455image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:42.732294image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:46.710256image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:40.088222image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:44.006009image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:40.816156image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:44.634142image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:41.455100image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:45.265860image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:42.104311image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:46.074122image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:42.860415image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:46.836629image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:40.310771image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:44.132306image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:40.950896image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:44.767559image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:41.586718image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:45.397313image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:42.233669image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:46.204115image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:42.995488image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:46.971392image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:40.451488image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:44.267080image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:41.074244image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:44.890305image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:41.731923image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:45.543409image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:42.374985image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:46.341742image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:43.116856image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:47.095657image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:40.572864image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:44.393547image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:41.200168image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:45.018559image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:41.860303image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:45.831144image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-02-26T14:53:42.494776image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-02-26T14:53:46.468899image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Missing values

Dataset A

2024-02-26T14:53:43.297075image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-02-26T14:53:47.274832image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-02-26T14:53:43.564988image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-02-26T14:53:47.536000image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
33833913Dahl, Mr. Karl Edwartmale45.000075988.050NaNS
929301Chaffee, Mr. Herbert Fullermale46.0010W.E.P. 573461.175E31S
21121212Cameron, Miss. Clear Anniefemale35.0000F.C.C. 1352821.000NaNS
13914001Giglio, Mr. Victormale24.0000PC 1759379.200B86C
83183212Richards, Master. George Sibleymale0.83112910618.750NaNS
27627703Lindblom, Miss. Augusta Charlottafemale45.00003470737.750NaNS
43944002Kvillner, Mr. Johan Henrik Johannessonmale31.0000C.A. 1872310.500NaNS
63563612Davis, Miss. Maryfemale28.000023766813.000NaNS
61461503Brocklebank, Mr. William Alfredmale35.00003645128.050NaNS
34634712Smith, Miss. Marion Elsiefemale40.00003141813.000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
454603Rogers, Mr. William JohnmaleNaN00S.C./A.4. 235678.0500NaNS
22622712Mellors, Mr. William Johnmale19.000SW/PP 75110.5000NaNS
67867903Goodwin, Mrs. Frederick (Augusta Tyler)female43.016CA 214446.9000NaNS
19719803Olsen, Mr. Karl Siegwart Andreasmale42.00145798.4042NaNS
151612Hewlett, Mrs. (Mary D Kingcome)female55.00024870616.0000NaNS
28028103Duane, Mr. Frankmale65.0003364397.7500NaNQ
84684703Sage, Mr. Douglas BullenmaleNaN82CA. 234369.5500NaNS
63863903Panula, Mrs. Juha (Maria Emilia Ojala)female41.005310129539.6875NaNS
24124213Murphy, Miss. Katherine "Kate"femaleNaN1036723015.5000NaNQ
878803Slocovski, Mr. Selman FrancismaleNaN00SOTON/OQ 3920868.0500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
33333403Vander Planke, Mr. Leo Edmondusmale16.02034576418.0000NaNS
131403Andersson, Mr. Anders Johanmale39.01534708231.2750NaNS
49249301Molson, Mr. Harry Marklandmale55.00011378730.5000C30S
39940012Trout, Mrs. William H (Jessie L)female28.00024092912.6500NaNS
35735802Funk, Miss. Annie Clemmerfemale38.00023767113.0000NaNS
60961011Shutes, Miss. Elizabeth Wfemale40.000PC 17582153.4625C125S
64864903Willey, Mr. EdwardmaleNaN00S.O./P.P. 7517.5500NaNS
72272302Gillespie, Mr. William Henrymale34.0001223313.0000NaNS
24824911Beckwith, Mr. Richard Leonardmale37.0111175152.5542D35S
56756803Palsson, Mrs. Nils (Alma Cornelia Berglund)female29.00434990921.0750NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
29930011Baxter, Mrs. James (Helene DeLaudeniere Chaput)female50.001PC 17558247.5208B58 B60C
50350403Laitinen, Miss. Kristina Sofiafemale37.00041359.5875NaNS
60961011Shutes, Miss. Elizabeth Wfemale40.000PC 17582153.4625C125S
58258302Downton, Mr. William Jamesmale54.0002840326.0000NaNS
76576611Hogeboom, Mrs. John C (Anna Andrews)female51.0101350277.9583D11S
86086103Hansen, Mr. Claus Petermale41.02035002614.1083NaNS
78578603Harmer, Mr. Abraham (David Lishin)male25.0003748877.2500NaNS
42242303Zimmerman, Mr. Leomale29.0003150827.8750NaNS
51651712Lemore, Mrs. (Amelia Milley)female34.000C.A. 3426010.5000F33S
33934001Blackwell, Mr. Stephen Weartmale45.00011378435.5000TS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.